On the Importance of Parameter Tuning in Text Categorization

نویسندگان

  • Cornelis H. A. Koster
  • Jean Beney
چکیده

Text Categorization algorithms have a large number of parameters that determine their behaviour, whose effect is not easily predicted objectively or intuitively and may very well depend on the corpus or on the document representation. Their values are usually taken over from previously published results, which may lead to less than optimal accuracy in experimenting on particular corpora. In this paper we investigate the effect of parameter tuning on the accuracy of two Text Categorization algorithms: the well-known Rocchio algorithm and the lesser-known Winnow. We show that the optimal parameter values for a specific corpus are sometimes very different from those found in literature. We show that the effect of individual parameters is corpus-dependent, and that parameter tuning can greatly improve the accuracy of both Winnow and Rocchio. We argue that the dependence of the categorization algorithms on experimentally established parameter values makes it hard to compare the outcomes of different experiments and propose the automatic determination of optimal parameters on the train set as a solution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Importance of Parameter Tuning in Text Categorization

Text Categorization algorithms have a large number of parameters that determine their behaviour, whose effect is not easily predicted objectively or intuitively and may very well depend on the corpus or on the document representation. Their values are usually taken over from previously published results. In this article we investigate the effect of parameter tuning on the accuracy of two Text C...

متن کامل

Efficient and Robust Parameter Tuning for Heuristic Algorithms

The main advantage of heuristic or metaheuristic algorithms compared to exact optimization methods is their ability in handling large-scale instances within a reasonable time, albeit at the expense of losing a guarantee for achieving the optimal solution. Therefore, metaheuristic techniques are appropriate choices for solving NP-hard problems to near optimality. Since the parameters of heuristi...

متن کامل

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

Text Categorization Problem

Document categorization problem gained a lot of importance in the last years due to the increase in the number of digital documents. This paper analyzes the performance of different classification algorithms on text categorization problem. Importance of parameter optimization on the performance of the algorithms is also discussed. The paper mostly focuses on the SVM (Support Vector Machines) al...

متن کامل

Tuning Shape Parameter of Radial Basis Functions in Zooming Images using Genetic Algorithm

Image zooming is one of the current issues of image processing where maintaining the quality and structure of the zoomed image is important. To zoom an image, it is necessary that the extra pixels be placed in the data of the image. Adding the data to the image must be consistent with the texture in the image and not to create artificial blocks. In this study, the required pixels are estimated ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006